Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(podGC): add Workflow-level DeleteDelayDuration #11325

Merged
merged 5 commits into from
Jul 13, 2023

Conversation

agilgur5
Copy link
Member

@agilgur5 agilgur5 commented Jul 10, 2023

Motivation

  • previously, this could only be set globally for the entire controller

  • now, users can configure this per Workflow as needed / wanted

    • and can override the global setting as desired
    • in particular, this matches how strategy etc can already be set per Workflow
      • so users have a single place to configure all options of podGC
  • in the future (I assume after some deprecation window), the global setting could be removed

    • workflowDefaults can achieve the same thing now

Resolves #9894

  • I took a bit of a different approach from there; with more explicit and consistent naming and typing
    • Using metav1.Duration was more consistent with the global and simplified the logic as well
    • Added more comments and more consistent comments as well

Modifications

  • add DeleteDelayDuration to Workflow-level podGC
    • add to workflow_types.go, then add logic to use this in pod_cleanup.go
    • add to PodGC examples as well
    • +codegen

Verification

So pod_cleanup_test.go actually has no unit test for the whole queuePodsForCleanup() function, so I couldn't just add a unit there. That function could require some mocking to properly test, so I didn't introduce a whole new test suite for that at this point.

Notes to Reviewers

Main issue I have is with the codegen for fields.md, where it explains metav1.Duration a bit misleadingly 😕

  • Everywhere else correctly shows that it boils down to just a string as well, but not this generated section 😕
  • See in-line comment about this

Related, durations seem a bit inconsistently typed in the codebase and spec. Some are ints, some are strings, and others are Durations.

  • Not sure if there's a reason for that? I would guess there is some history?

docs/fields.md Outdated
Comment on lines 5126 to 5143
## Duration

Duration is a wrapper around time.Duration which supports correctmarshaling to YAML and JSON. In particular, it marshals into strings, whichcan be used as map keys in json.

<details>
<summary>Examples with this field (click to open)</summary>
<br>

- [`pod-gc-strategy-with-label-selector.yaml`](https://github.com/argoproj/argo-workflows/blob/master/examples/pod-gc-strategy-with-label-selector.yaml)

- [`pod-gc-strategy.yaml`](https://github.com/argoproj/argo-workflows/blob/master/examples/pod-gc-strategy.yaml)
</details>

### Fields
| Field Name | Field Type | Description |
|:----------:|:----------:|---------------|
|`duration`|`string`|_No description available_|

Copy link
Member Author

@agilgur5 agilgur5 Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the part I find issues with from the codegen. It's missing spaces (between multi-line comments in the Go), for one thing, but also it confusingly lists duration as a sub-field. That is correct for the Go type itself, but it marshals to a string, so the configuration is just a plain string (as the CRDs show).

This doc makes me think I would do:

  podGC:
    deleteDelayDuration:
      duration: 30s

instead of just:

  podGC:
    deleteDelayDuration: 30s

With the second one being how it actually is.

Is there a way to override this or something? Casting somewhere? Not sure

Copy link
Member Author

@agilgur5 agilgur5 Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was able to fix the spacing by modifying this line in docgen.go, but while fixing this it ends up adding extraneous spaces in several places 😕
The resulting diff for fields.md ends up quite large as well, so I didn't even commit that change

EDIT: was more or less able to fix the spacing issue with another ReplaceAll of multi-space to single space, but still has a big diff not limited to this field. and also doesn't resolve the confusing description here

- previously, this could only be set globally for the entire controller
- now, users can configure this per Workflow as needed / wanted
  - and can override the global setting as desired
  - in particular, this matches how `strategy` etc can already be set per Workflow
    - so users have a single place to configure all options of `podGC`

- in the future (I assume after some deprecation window), the global setting could be removed
  - `workflowDefaults` can achieve the same thing now

Signed-off-by: Anton Gilgur <agilgur5@gmail.com>
Signed-off-by: Anton Gilgur <agilgur5@gmail.com>
this one feels the most potentially problematic, so just split it into its own commit
- though it's still generated, so still impacted by the other commits of this PR

Signed-off-by: Anton Gilgur <agilgur5@gmail.com>
- no more words or sentences joined together with no spaces in between at least
  - but also reduced spaces and increased spaces here and there as a side-effect

Signed-off-by: Anton Gilgur <agilgur5@gmail.com>
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliev0 Could you help review this?

@agilgur5
Copy link
Member Author

Not sure how codegen is failing, I actually haven't been able to reproduce that locally, even after making sure my deps were updated and re-running several times. Can manually fix that, but odd that my local install is not producing the same output

@juliev0
Copy link
Contributor

juliev0 commented Jul 12, 2023

Not sure how codegen is failing, I actually haven't been able to reproduce that locally, even after making sure my deps were updated and re-running several times. Can manually fix that, but odd that my local install is not producing the same output

I wonder if it could be running a different version of Go from what you are? looks like this is using 1.20: https://github.com/argoproj/argo-workflows/blob/master/.github/workflows/ci-build.yaml#L205

@juliev0
Copy link
Contributor

juliev0 commented Jul 12, 2023

@agilgur5 Looks good. Once you get the CI all passing, let me know and I can approve.

@agilgur5
Copy link
Member Author

I wonder if it could be running a different version of Go from what you are? looks like this is using 1.20: https://github.com/argoproj/argo-workflows/blob/master/.github/workflows/ci-build.yaml#L205

So I'm using the dev container on my current machine, which is also using Go 1.20 😅
Confirmed that too:

vscode ➜ ~/go/src/github.com/argoproj/argo-workflows (feat-workflow-podgc-delay) $ go version
go version go1.20.5 linux/amd64

Strange. Might try rebuilding the dev container from scratch 🤷

- for some reason, I needed a fresh build of the devcontainer to get this to reproduce on my local machine
  - works now 🤷

Signed-off-by: Anton Gilgur <agilgur5@gmail.com>
@agilgur5
Copy link
Member Author

Might try rebuilding the dev container from scratch 🤷

fresh build solved it 🤷

@agilgur5 Looks good. Once you get the CI all passing, let me know and I can approve.

@juliev0 should be good to go now!

@juliev0 juliev0 merged commit 383bb6b into argoproj:master Jul 13, 2023
23 checks passed
tico24 pushed a commit to tico24/argo-workflows that referenced this pull request Jul 13, 2023
Signed-off-by: Anton Gilgur <agilgur5@gmail.com>
Signed-off-by: Tim Collins <tim@thecollins.team>
@agilgur5 agilgur5 deleted the feat-workflow-podgc-delay branch July 13, 2023 14:13
@agilgur5 agilgur5 added area/controller Controller issues, panics area/spec Changes to the workflow specification. labels Aug 27, 2023
@agilgur5 agilgur5 added the area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more label Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more area/spec Changes to the workflow specification.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add delayed delete to podgc to replace global delayed delete (PodGCDeleteDelayDuration)
3 participants